Protein Tau’s Role in Gene Expression
Group 25: Ana Moral (s232119), Jacqueline Printz (s194377), Jenni Kinnunen (s204697), João Prazeres (s243036), William Gunns (s242051)
1. Introduction - Protein Tau
Function: Microtubule protein essential for cytoskeletal stability and neuronal transport.
Supports healthy neuronal functions.
Destabilization linked to neuronal dysfunction, and Alzheimer’s Disease.
Previous studies concluded that Tau destabilization led to an alteration in the expression of glutamatergic genes.
Experimental Objective:
Is the overexpression of Tau associated to gene expression alterations?
2. Experimental Setup
Differential gene expression analysis of RNA-seq data performed on:
- Control: 3 samples of SH-SY5Y cells with overexpression of a control vector.
- Experimental Condition: 3 samples of SH-SY5Y cells with overexpression of Tau 0N4R isoform.
RNA-seq data was reported on 3 xls sheets:
- Read Counts.
- RPM (Reads Per Million).
- RPKM (Reads Per Kilobase Million).
The 3 sheets were joined into one large tibble data frame.
# A tibble: 58,395 × 9
...1 GeneName description SH_ctrl_1 SH_ctrl_2 SH_ctrl_3 SH_tau_1 SH_tau_2
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ENSG000… TSPAN6 tetraspani… 319 582 280 214 189
2 ENSG000… TNMD tenomoduli… 0 0 0 0 0
3 ENSG000… DPM1 dolichyl-p… 792 1556 781 521 502
4 ENSG000… SCYL3 SCY1 like … 517 561 445 323 365
5 ENSG000… C1orf112 chromosome… 533 537 566 601 584
6 ENSG000… FGR FGR proto-… 0 0 0 2 2
7 ENSG000… CFH complement… 2 0 1 0 0
8 ENSG000… FUCA2 alpha-L-fu… 487 761 447 341 321
9 ENSG000… GCLC glutamate-… 430 703 246 233 218
10 ENSG000… NFYA nuclear tr… 1101 1156 760 898 583
# ℹ 58,385 more rows
# ℹ 1 more variable: SH_tau_3 <dbl>
3. Data Wrangling
First the data was prepared and made clean by:
1. by joining all dataframes into one
2. renaming columns
3. removing unecessary and invalid observations (including descriptions and rows of data that are all zero)
Some of the key functions that were used include full_join, mutate, rename, select) Following this, each row corresponds to an observation, each column corresponds to a variable and each cell is a value. Could insert picture of ‘clean’ data - lecture style.
<<<<<<< HEAD
=======
The data was then log transformed to enable a deeper analysis of observations with small margins of difference. Average across trials of experiment (and variance) THINK THIS BIT IS FOR AUGMENT
>>>>>>> d8f6df3bf2a055bb619b35aa730d25249a233f9e
4. Data Augment
::: columns ::: {.column width=“40%” style=“font-size: small;”} ::: incremental Normalized Data
- Log transformation applied to selected columns
- Small value (0.0001) added to avoid zeros in data
Calculated Mean - Mean values for control and tau groups were calculated across replicates
Fold Change - Calculated fold change between control and tau groups for each measure - Fold change values used to filter significant genes
Filtered Data - Genes with significant fold change (>1 or <-1) retained - Replicates averaged for non-significant genes
Final Data -Data stored in three separate files for analysis ::: ::: ::: {.column width=“60%” style=“font-size: small;”}{fig-align=“right” width=“300”} :::
5. Data Description part 1
All data
522,648 observations, 3 attributes
29,036 genes
18 experiments, each of them have 3 replicates
Filtered data
110,958 observations
19,379 genes
6. Data Description part 2
<<<<<<< HEAD

=======
Insert picture of plot Similar description to that in the report
>>>>>>> d8f6df3bf2a055bb619b35aa730d25249a233f9e
7. Analysis PCA
::: columns ::: {.column width=“40%” style=“font-size: small;”} ::: incremental Objective
- Confirm that RPM, RPKM, and reads yield similar results
- Verify differences between control and tau experiments
Approach - 3 PCAs performed separately for RPM, RPKM, and reads data - 1 final PCA conducted on combined data
Results - Plots of individual PCAs show how each data type clusters - Final PCA confirms global differences between control and tau groups ::: ::: ::: {.column width=“60%” style=“font-size: small;”} {fig-align=“right” width=“300”} ::: ————————————————————————
8. Analysis PCA
Maybe for plots
9. Gene Set Enrichment Analysis (GSEA)
Computational method to determine if a set of genes shows statistically significant differences in control and Tau over expressing conditions.
::: {.left} - Plot genes
::: {style=“width: 45%; text-align: left;”} 
::: {.right} - Plot pathways
::: {style=“width: 45%; text-align: right;”}
————————————————————————
10. Discussion based on the GSEA/conclusion
Which genes were overexpressed? Does it make sense with the literature?
Challenges
- [Challenge 1]
- [Challenge 2]
Limitations
- [Limitation 1]
- [Limitation 2]